Learning Structured Predictors from Bandit Feedback for Interactive NLP

نویسندگان

  • Artem Sokolov
  • Julia Kreutzer
  • Christopher Lo
  • Stefan Riezler
چکیده

Structured prediction from bandit feedback describes a learning scenario where instead of having access to a gold standard structure, a learner only receives partial feedback in form of the loss value of a predicted structure. We present new learning objectives and algorithms for this interactive scenario, focusing on convergence speed and ease of elicitability of feedback. We present supervised-to-bandit simulation experiments for several NLP tasks (machine translation, sequence labeling, text classification), showing that bandit learning from relative preferences eases feedback strength and yields improved empirical convergence.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Bandit Structured Prediction for Learning from Partial Feedback in Statistical Machine Translation

We present an approach to structured prediction from bandit feedback, called Bandit Structured Prediction, where only the value of a task loss function at a single predicted point, instead of a correct structure, is observed in learning. We present an application to discriminative reranking in Statistical Machine Translation (SMT) where the learning algorithm only has access to a 1 − BLEU loss ...

متن کامل

Structured Prediction via Learning to Search under Bandit Feedback

We present an algorithm for structured prediction under online bandit feedback. The learner repeatedly predicts a sequence of actions, generating a structured output. It then observes feedback for that output and no others. We consider two cases: a pure bandit setting in which it only observes a loss, and more fine-grained feedback in which it observes a loss for every action. We find that the ...

متن کامل

Bandit Structured Prediction for Neural Sequence-to-Sequence Learning

Bandit structured prediction describes a stochastic optimization framework where learning is performed from partial feedback. This feedback is received in the form of a task loss evaluation to a predicted output structure, without having access to gold standard structures. We advance this framework by lifting linear bandit learning to neural sequence-to-sequence learning problems using attentio...

متن کامل

An Interactive Tool for Natural Language Processing on Clinical Text

Natural Language Processing (NLP) systems often make use of machine learning techniques that are unfamiliar to endusers who are interested in analyzing clinical records. Although NLP has been widely used in extracting information from clinical text, current systems generally do not support model revision based on feedback from domain experts. We present a prototype tool that allows end users to...

متن کامل

Counterfactual Risk Minimization

We develop a learning principle and an efficient algorithm for batch learning from logged bandit feedback. Unlike in supervised learning, where the algorithm receives training examples (xi, y ∗ i ) with annotated correct labels y ∗ i , bandit feedback merely provides a cardinal reward δi ∈ R for the prediction yi that the logging system made for context xi. Such bandit feedback is ubiquitous in...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2016